Distributed Fault Tolerance - Lessons Learned from Delta-4

نویسنده

  • David Powell
چکیده

Software-implemented approaches to fault tolerance are very resilient to change since evolution in hardware technology does not require extensive re-design of specialized hardware. This paper argues the case for implementing fault tolerance in a distributed fashion and reports the approach adopted in the European Delta-4 project. Fault tolerance is achieved by replicating capsules (the run-time representation of application objects) on distributed nodes interconnected by a local area network. Capsule groups can be configured to tolerate either stopping failures or arbitrary failures. Multipoint protocols are used for coordinating capsule groups and for error processing and fault treatment. The paper concludes with a critical analysis of the project’s results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Delta-4 approach to dependability in open distributed computing systems

1.1 Fault-tolerance As part of the European Strategic Programme for Research in Information Technology (ESPRIT), the Delta-4 project is seeking to define an open, faulttolerant, distributed computing architecture. This paper presents the overall Delta-4 framework for open, fault-tolerant, distributed computing systems and sketches the current implementation which is based on a local area networ...

متن کامل

Lessons Learned from Building and Using the Arjuna Distributed Programming System

Arjuna is an object-oriented programming system implemented in C++ that provides a set of tools for the construction of fault-tolerant distributed applications. This paper reports on the experience gained by building and using the system. It then describes how in light of this experience, a new version of the system is being designed.

متن کامل

Framework to Enable Scalable and Distributed Application Development: Lessons Learned While Developing the Opportunistic Seamless Localization System

In real-time middleware, latency is a critical aspect. When the input rate exceeds a certain threshold, queuing will result in an exponentially increasing delay. Distributed computing enables scaling so that this growing latency is kept at a constant minimum. When developing a generic framework, redundancy, platform independency, fault tolerance and transparency are important features that need...

متن کامل

Fellowship Winner: The Structure And Evolution Of A Distributed Measurement Framework

Distributed system are becoming increasingly important in day to day computing tasks. While building such systems is a well understood process, measuring and monitoring such systems is not. This paper describes the difficulties inherent in measuring distributed systems, and enumerates fourgoals for such measurement: longevity, flexibility, fault-tolerance, and unintrusiveness. It then describes...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1993